智能论文笔记

We present the first neural network model to achieve real-time and streaming target sound extraction. To accomplish this, we propose Waveformer, an encoder-decoder architecture with a stack of dilated causal convolution layers as the encoder, and a transformer decoder layer as the decoder. This hybrid architecture uses dilated causal convolutions for processing large receptive fields in a computationally efficient manner, while also benefiting from the performance transformer-based architectures provide. Our evaluations show as much as 2.2-3.3 dB improvement in SI-SNRi compared to the prior models for this task while having a 1.2-4x smaller model size and a 1.5-2x lower runtime. Open-source code and datasets: https://github.com/vb000/Waveformer

translated by 谷歌翻译

NeuriCam: Video Super-Resolution and Colorization Using Key Frames

Bandhav Veluri , Ali Saffari , Collin Pernu , Joshua Smith , Michael Taylor , Shyamnath Gollakota

分类：计算机视觉

2022-07-25

我们提出了Neuricam，这是一种基于钥匙帧的视频超分辨率和着色系统，可从双模式IoT摄像机获得低功耗视频捕获。我们的想法是设计一个双模式摄像机系统，其中第一个模式是低功率（1.1〜MW），但仅输出灰度，低分辨率和嘈杂的视频，第二种模式会消耗更高的功率（100〜MW），但输出会输出。颜色和更高分辨率的图像。为了减少总能源消耗，我们在高功率模式下高功率模式仅输出图像每秒一次。然后将来自该相机系统的数据无线流传输到附近的插入网关，在那里我们运行实时神经网络解码器，以重建更高的分辨率颜色视频。为了实现这一目标，我们基于每个空间位置的特征映射和输入框架的内容之间的相关性，引入了一种注意力特征滤波器机制，该机制将不同的权重分配给不同的特征。我们使用现成的摄像机设计无线硬件原型，并解决包括数据包丢失和透视不匹配在内的实用问题。我们的评估表明，我们的双摄像机硬件可减少相机的能耗，同时在先前的视频超级分辨率方法中获得平均的灰度PSNR增益为3.7〜db，而在现有的颜色传播方法上，我们的灰度尺度PSNR增益为3.7 〜db。开源代码：https：//github.com/vb000/neuricam。

translated by 谷歌翻译